Environment Setting
# Import required packages
library(tidytransit)
library(tidyverse)
library(tmap)
library(ggplot2)
# library(gtfsrouter)
library(here)
library(units)
library(sf)
library(leaflet)
library(tidycensus)
library(plotly)
# library(igraph)
library(tidygraph)
# library(dodgr)
library(leafsync)
# source("https://raw.githubusercontent.com/BonwooKoo/gtfs_to_igraph/master/gtfs_to_igraph.R")
wd <- file.path(Sys.getenv('setwd'),"work/working/School/UA_2022/external/Lab/module_2")
setwd(eval(wd))
What is the General Transit Feed Specification (GTFS)?
Before GTFS, there wasn’t (to the best of my knowledge, at least) a standardized format for transit timetables other associated information. Users of transit data across multiple transit agencies had to deal with different data formats. With GTFS, feeds from different agencies became standardized (although not perfect), making applications of the data easy.
Useful Links:
- tidytransit (i.e., R package) Vignettes
- Google GTFS
- Open Mobility Data - Archive of GTFS feeds from around the world
- transitland - Another archive of GTFS & other feeds
Let’s download some GTFS feed provided by Metropolitan Atlanta Rapid Transit Authority (MARTA).
# This GTFS file is downloaded from
# https://opendata.atlantaregional.com/datasets/marta-gtfs-latest-feed/about
atl <- read_gtfs('MARTA_GTFS_Latest_Feed.zip')
Understand GTFS
GTFS consists of multiple tables and comes in a zip file as a single package. Transit is a complex system that contains multiple components (e.g., routes, stops, service schedules) working together. The table below shows a brief description of what each data.frame contains. This table is taken from Google.
Table 1. Description of tables in GTFS feed
| Table name | Defines |
|---|---|
| agency | (Required) Transit agencies with service represented in this dataset. |
| stops | (Required) Stops where vehicles pick up or drop off riders. Also defines stations and station entrances. |
| routes | (Required) Transit routes. A route is a group of trips that are displayed to riders as a single service. |
| trips | (Required) Trips for each route. A trip is a sequence of two or more stops that occur during a specific time period. |
| stop_times | (Required) Times that a vehicle arrives at and departs from stops for each trip. |
| calendar | Service dates specified using a weekly schedule with start and end dates. This file is required unless all dates of service are defined in calendar_dates.txt. |
| calendar_dates | Exceptions for the services defined in the calendar.txt. If calendar.txt is omitted, then calendar_dates.txt is required and must contain all dates of service. |
| fare_attributes | Fare information for a transit agency’s routes. |
| fare_rules | Rules to apply fares for itineraries. |
| shapes | Rules for mapping vehicle travel paths, sometimes referred to as route alignments. |
| frequencies | Headway (time between trips) for headway-based service or a compressed representation of fixed-schedule service. |
| transfer | Rules for making connections at transfer points between routes. |
| pathways | Pathways linking together locations within stations. |
| levels | Levels within stations. |
| feed_into | Dataset metadata, including publisher, version, and expiration information. |
| translations | Translated information of a transit agency. |
| attributions | Specifies the attributions that are applied to the dataset. |
These tables are relational table that are connected through a system of join keys. The schematic below shows which tables are linked to which tables, through which join keys. Understanding this structure is essential in using GTFS.
Figure 1. GTFS relational table structure IMAGE SOURCE: http://tidytransit.r-transit.org/articles/introduction.html
What’s inside atl object
Now, let’s take a look the atl object in which we read
the GTFA feed from MARTA. This object is a list. In it,
names(atl) shows that there are 9 data.frames. Notice that
there are many tables displayed in Table 1 above, but only some of them
are in atl. Notice in Table 1 that only some are required;
others are optional.
typeof(atl)
## [1] "list"
names(atl)
## [1] "agency" "calendar" "calendar_dates" "routes"
## [5] "shapes" "stop_times" "stops" "trips"
## [9] "."
print(head(atl))
## $agency
## # A tibble: 1 × 7
## agency_id agency_name agenc…¹ agenc…² agenc…³ agenc…⁴ agenc…⁵
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 MARTA Metropolitan Atlanta Rapid … http:/… Americ… en (404)8… custse…
## # … with abbreviated variable names ¹agency_url, ²agency_timezone,
## # ³agency_lang, ⁴agency_phone, ⁵agency_email
##
## $calendar
## # A tibble: 4 × 10
## service_id monday tuesday wednesday thursday friday saturday sunday start_date
## <chr> <int> <int> <int> <int> <int> <int> <int> <date>
## 1 2 0 0 0 0 0 0 0 2021-08-14
## 2 3 0 0 0 0 0 1 0 2021-08-14
## 3 4 0 0 0 0 0 0 1 2021-08-14
## 4 5 1 1 1 1 1 0 0 2021-08-14
## # … with 1 more variable: end_date <date>
## # ℹ Use `colnames()` to see all variable names
##
## $calendar_dates
## # A tibble: 3 × 3
## service_id date exception_type
## <chr> <date> <int>
## 1 5 2021-09-06 2
## 2 5 2021-11-25 2
## 3 5 2021-11-26 2
##
## $routes
## # A tibble: 118 × 8
## route_id route_short_name route_lon…¹ route…² route…³ route…⁴ route…⁵ route…⁶
## <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr>
## 1 15664 1 Marietta B… "" 3 https:… FF00FF ""
## 2 15665 2 Ponce de L… "" 3 https:… 008000 ""
## 3 15666 3 Martin Lut… "" 3 https:… FF8000 ""
## 4 15667 4 Moreland A… "" 3 https:… FF00FF ""
## 5 15668 5 Piedmont R… "" 3 https:… 00FFFF ""
## 6 15669 6 Clifton Ro… "" 3 https:… 008080 ""
## 7 15670 8 North Drui… "" 3 https:… 008000 ""
## 8 15671 9 Boulevard … "" 3 https:… 00FFFF ""
## 9 15672 12 Howell Mil… "" 3 https:… FF00FF ""
## 10 15673 14 14th Stree… "" 3 https:… 00FF00 ""
## # … with 108 more rows, and abbreviated variable names ¹route_long_name,
## # ²route_desc, ³route_type, ⁴route_url, ⁵route_color, ⁶route_text_color
## # ℹ Use `print(n = ...)` to see more rows
##
## $shapes
## # A tibble: 370,776 × 4
## shape_id shape_pt_lat shape_pt_lon shape_pt_sequence
## <chr> <dbl> <dbl> <int>
## 1 119402 33.8 -84.6 1
## 2 119402 33.8 -84.6 2
## 3 119402 33.8 -84.6 3
## 4 119402 33.8 -84.6 4
## 5 119402 33.8 -84.6 5
## 6 119402 33.8 -84.6 6
## 7 119402 33.8 -84.6 7
## 8 119402 33.8 -84.6 8
## 9 119402 33.8 -84.6 9
## 10 119402 33.8 -84.6 10
## # … with 370,766 more rows
## # ℹ Use `print(n = ...)` to see more rows
##
## $stop_times
## # A tibble: 1,033,990 × 5
## trip_id arrival_time departure_time stop_id stop_sequence
## <chr> <time> <time> <chr> <int>
## 1 8044645 14:35:00 14:35:00 903320 1
## 2 8044645 14:36:02 14:36:02 903448 2
## 3 8044645 14:36:51 14:36:51 901144 3
## 4 8044645 14:37:43 14:37:43 904219 4
## 5 8044645 14:38:19 14:38:19 903850 5
## 6 8044645 14:38:39 14:38:39 903664 6
## 7 8044645 14:39:39 14:39:39 213268 7
## 8 8044645 14:40:12 14:40:12 213101 8
## 9 8044645 14:41:35 14:41:35 212926 9
## 10 8044645 14:41:53 14:41:53 211886 10
## # … with 1,033,980 more rows
## # ℹ Use `print(n = ...)` to see more rows
Converting GTFS Into Geospatial Format
The function gtfs_as_sf converts shapes
and stops tables in GTFS data into sf objects. In the
print out below, you will notice that all other tables have not changed,
but ‘shapes’ and ‘stops’ tables are now Simple feature
collection and have added column ‘geometry’ that contains a
series of coordinates. Under the hood, gtfs_as_sf is no
special function; this function uses st_as_sf() to do the
conversion, the same way we have been doing so far in this class.
Even though we have 9 tables but only 2 of them are in sf format, that’s fine. We can still figure out other tables’ spatial properties because we can join the non-sf tables to shapes and/or stops. We can join other tables to ‘shapes’ and ‘stops’ table using join keys.
# Converting GTFS data into sf objects
atlsf <- tidytransit::gtfs_as_sf(atl, crs = 4326)
atlsf$stops %>% head() # Notice that this is POINT
## Simple feature collection with 6 features and 3 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -84.46982 ymin: 33.75325 xmax: -84.34421 ymax: 33.92086
## Geodetic CRS: WGS 84
## # A tibble: 6 × 4
## stop_id stop_code stop_name geometry
## <chr> <chr> <chr> <POINT [°]>
## 1 907933 27 HAMILTON E HOLMES STATION (-84.4693 33.75455)
## 2 908023 28 WEST LAKE STATION (-84.44533 33.75333)
## 3 907906 39 WEST LAKE STATION (-84.44557 33.75325)
## 4 907907 40 HAMILTON E HOLMES STATION (-84.46982 33.75452)
## 5 908051 53 DUNWOODY STATION (-84.34421 33.92086)
## 6 908054 57 LINDBERGH CENTER STATION (-84.36936 33.82339)
atlsf$shapes %>% head() # This is LINESTRING
## Simple feature collection with 6 features and 1 field
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -84.55977 ymin: 33.69654 xmax: -84.20872 ymax: 33.96008
## Geodetic CRS: WGS 84
## shape_id geometry
## 1 119402 LINESTRING (-84.55227 33.77...
## 2 119403 LINESTRING (-84.3694 33.821...
## 3 119404 LINESTRING (-84.39192 33.75...
## 4 119405 LINESTRING (-84.50486 33.73...
## 5 119406 LINESTRING (-84.21014 33.85...
## 6 119407 LINESTRING (-84.30635 33.88...
Let’s take a look at their maps of their routes and stops, respectively. These maps (route shapes on the left; stop locations on the right) show the coverage of MARTA.
Notice that I am using leaflet() for mapping instead of tmap(). Of course, you can make the same map with tmap package as well; I am using leaflet() in this module just to expose you to different packages. As you follow along this R Mark Down, feel free to make these maps in whichever package you prefer to use. Through this course, you can use whatever package you like to use for interactive mapping and visualization; it won’t affect your grade in any way.
# Visualize
a <- leaflet(atlsf$shapes) %>% # add data to display
addProviderTiles(providers$CartoDB.DarkMatter) %>% # add BaseMap
addPolylines(weight = 1, color = "red") %>% # add lines
addControl(htmltools::HTML("Route shapes")) # add Title
b <- leaflet(atlsf$stops) %>% # add data to display
addProviderTiles(providers$CartoDB.DarkMatter) %>% # add BaseMap
addCircles(weight = 3, color = "red") %>% # add circles
addControl(htmltools::HTML("Stop shapes")) # add Title
leafsync::sync(a,b)
Calculating Transit Service Quality
There are, of course, many different analyses that can be done using GTFS. Among nearly infinite numbers of analyses that can be done, this class will look at the equity aspect of transit services. There are numerous studies on transit service equity that uses highly sophisticated methods (e.g., this paper). This course will present more simple (but intuitive) methods to evaluate this issue.
Calculate area of service by SES
One of the simplest ways to calculate transit service quality is to look at the areas within each neighborhood that is within a certain distance from transit stops or routes.
In this section, we will
- Extract transit lines (and stops) for rail and bus transit,
- Draw buffers around them,
- Intersect the buffer with Census Tracts,
- Calculate the proportion of each Census Tract that are within the buffers.
Step 1: Extracting lines
GTFS from MARTA contains information about both rails and buses. The
routes table in atl has a column named
route_type, which contains integer values denoting types of
transit service (e.g., bus vs. rail transit). To see what each integer
means, you will need to look at the link to the Reference provided above
(or see below for a screen capture from the Reference).
Reference from Google
routes table contains route_types, which is needed for
separating rail transit from bus transit. shapes table
contains the geographic information needed for geo-operations such as
buffer, intersections, etc. Because routes and
shapes table do not contain a common key, they need to be
joined through trips table as intermediate table.
Notice that there can be multiple trips that runs on the same routes.
If we join trips and routes, one rows in
routes table will be matched with multiple rows in
trips table and create duplicated.
# Join routes table with trips table with shapes table
trip_routes <- atl$trips %>%
full_join(atl$routes, by = "route_id")
trip_shape <- atlsf$shapes %>%
full_join(atl$trips %>%
select(shape_id, trip_id),
by = "shape_id")
# Merging the two into one and then taking only one row for each
# unique combination of route_id and shape_id.
route_trip_shape <- trip_shape %>%
select(-shape_id) %>%
full_join(trip_routes, by = c("trip_id")) %>%
group_by(shape_id, route_id) %>%
slice(1)
# Route type is not really intuitive - let's fix that
route.shape <- route_trip_shape %>%
mutate(route_type = case_when(
route_type == "0" ~ 'Tram, Streetcar',
route_type == "1" ~ 'Subway, Metro',
route_type == "2" ~ 'Rail',
route_type == "3" ~ 'Bus'
))
pal <- leaflet::colorFactor(c("red", "orange", "pink"), domain = route.shape$route_type)
route.shape %>%
leaflet::leaflet(data = .) %>%
leaflet::addProviderTiles(providers$CartoDB.DarkMatter) %>%
leaflet::addPolylines(color = ~pal(route_type),
weight = 3,
opacity = 0.9,
popup = paste0("Route type: ", route.shape$route_type))
Step 2: Drawing buffers
We can use st_buffer() function from sf package to
draw buffers. Drawing buffers is a simple process, but there is a caveat
to it which is rooted in the recent change in sf package. Until Version
1.0, the sf package used ‘equirectangular projection’ using GEOS library
for many operations that involve, for example,
st_intersects, st_intersection,
st_union, st_nearest_point,
st_join, etc. What this means is that sf package carried
out these operations assuming that the Earth is a flat surface. It did
so even when the given data is in geographical coordinates.
From Version 1.0, sf package started using S2 spherical geometry for
spatial operations, based on S2 geometry library written by Google. For
our purposes, the difference introduced in Version 1.0. is unlikely to
have huge impacts. However, we may see some glitches here and there. For
example, in the code chunks below I transform route.shape
from crs = 4326 to crs = 26967. If we use crs = 4326, the buffer
polygons will have rough outlines with zig-zag patterns. You can try it
yourself. For more information on what all of these means, you can read
this
post.
# If we do not convert CRS to a projected one, the buffer may generate a ragged boundaries.
# https://r-spatial.github.io/sf/articles/sf7.html#buffers-1
# https://r-spatial.org/r/2020/06/17/s2.html#sf-10-goodbye-flat-earth-welcome-s2-spherical-geometry
sf::sf_use_s2(TRUE) # This is default to TRUE when we load SF package. So this code is not really needed. I added it here just to make it more explicit.
MARTA_buffer <- route.shape %>%
sf::st_transform(crs = 26967) %>%
sf::st_buffer(dist = units::set_units(400, "m"))
# To union the buffer polygons by route_type
MARTA_buffer_group <- MARTA_buffer %>%
group_by(route_type) %>%
summarise()
pal <- colorFactor(palette = c("red", "yellow", "blue"), domain = MARTA_buffer_group$route_type)
leaflet() %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
addPolygons(data = MARTA_buffer_group %>% st_transform(crs = 4326), col = ~pal(route_type),
popup = MARTA_buffer_group$route_type)
Step 3: Intersect the buffer with Census Tracts,
We will use tidycensus package again to download Census
data at Tract-level.
And then, of course, some data cleaning..
acs2020c <- acs2020 %>%
select(GEOID,
hhinc = hhincE,
r_tot = r_totE,
r_wh = r_whE,
r_bl = r_blE,
tot_hh = tot_hhE,
own_novhc = own_novhcE,
rent_novhc = rent_novhcE) %>%
mutate(pct_wh = r_wh / r_tot,
pct_bl = r_bl / r_tot,
pct_novhc = (own_novhc + rent_novhc)/tot_hh) %>%
# Calculate area of the Census Tract polygons
mutate(area1 = unclass(st_area(.))) %>%
mutate(ln_pop_den = log((r_tot / (area1/1000^2)) + 1)) %>%
filter(!is.na(hhinc), !is.na(r_tot), !is.na(own_novhc))
pal1 <- colorNumeric(palette = "YlOrRd", domain = acs2020c$hhinc)
pal2 <- colorFactor("Spectral", domain = MARTA_buffer_group$route_type)
leaflet() %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
addPolygons(group = "ACS",
data = acs2020c,
color = "grey",
fillColor = ~pal1(hhinc),
fillOpacity = 0.5,
weight = 1,
popup = leafpop::popupTable(round(st_drop_geometry(acs2020c[,c("hhinc", "pct_wh", "pct_bl")]),2))) %>%
addPolygons(group = "MARTA",
data = MARTA_buffer_group %>%
st_transform(crs = st_crs(acs2020c)),
color = ~pal2(route_type),
weight = 1,
opacity = 0.9) %>%
addLayersControl(
overlayGroups = c("ACS", "MARTA"),
options = layersControlOptions(collapsed = FALSE)
)
## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
## Need '+proj=longlat +datum=WGS84'
## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
## Need '+proj=longlat +datum=WGS84'
# Intersect buffer with tract
buffer_per_tract1 <- acs2020c %>%
st_transform(crs = 26967) %>%
st_intersection(MARTA_buffer_group) %>%
filter(route_type == "Bus") %>%
# After Intersection
mutate(subarea = unclass(st_area(.)),
pct_served1 = subarea/area1) %>%
st_transform(crs = 4326)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
buffer_per_tract2 <- acs2020c %>%
left_join(buffer_per_tract1 %>% select(GEOID, pct_served1) %>% st_set_geometry(NULL),
by = "GEOID") %>%
# There are many NAs in the pct_served1 column
mutate(pct_served1 = case_when(is.na(pct_served1) ~ 0,
TRUE ~ pct_served1))
pal <- colorNumeric(palette = "YlOrRd", domain = buffer_per_tract2$pct_served1)
leaflet() %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
addPolygons(data = buffer_per_tract2,
fillColor = ~pal(pct_served1), fillOpacity = 0.9,
color = "white", opacity = 0.2,
weight = 1) %>%
addLegend("bottomright",
pal = pal,
values = buffer_per_tract2$pct_served1,
title = "% Area within 400m from transit line")
## Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
## Need '+proj=longlat +datum=WGS84'
Be careful when calculating area.
tm_shape( buffer_per_tract %>% mutate(diff = (area2 - area1)/area2)) + tm_polygons(col = 'diff', style = 'quantile')
# Is there any correlation between SES and service area?
var_name <- list(
'hhinc'="Annual Household Income",
'pct_bl'="% Black population",
'pct_novhc'="% Household without vehicle",
'pct_wh'="% White population",
'r_tot'="Total population",
'ln_pop_den' = "Population density"
)
var_labeller <- function(variable, value){
return(var_name[value])
}
na_index <- buffer_per_tract2 %>%
select(hhinc, r_tot, ln_pop_den, starts_with('pct')) %>%
is.na(.) %>%
apply(., 1, function(x) sum(x)==0)
buffer_per_tract2 %>%
filter(na_index) %>%
pivot_longer(cols = c('hhinc', 'pct_wh', 'pct_bl', 'pct_novhc', "r_tot", "ln_pop_den"), names_to = "variable", values_to = "value") %>%
ggplot() +
geom_point(aes(x = pct_served1, y = value), alpha = 0.4) +
facet_wrap(~variable, scales = "free_y", labeller = var_labeller) +
labs(x = "% Area with 400m beffer from transit line") +
theme_bw()
## Warning: The labeller API has been updated. Labellers taking `variable` and
## `value` arguments are now deprecated. See labellers documentation.
test_var <- c("hhinc", "pct_bl", "pct_wh", "pct_novhc", "r_tot", "ln_pop_den")
map_chr(test_var, function(x){
cor.object <- cor.test(buffer_per_tract2[['pct_served1']],
buffer_per_tract2[[x]])
r <- round(cor.object$estimate,3)
t <- round(cor.object$statistic,3)
p <- round(cor.object$p.value,3)
return(paste0("for ",x ,", r=", r, " (t=", t, ", p=", p, ")"))
})
## [1] "for hhinc, r=-0.419 (t=-11.191, p=0)"
## [2] "for pct_bl, r=0.121 (t=2.95, p=0.003)"
## [3] "for pct_wh, r=-0.134 (t=-3.292, p=0.001)"
## [4] "for pct_novhc, r=0.526 (t=15.02, p=0)"
## [5] "for r_tot, r=-0.311 (t=-7.935, p=0)"
## [6] "for ln_pop_den, r=0.526 (t=14.997, p=0)"
Reflecting the frequency differences
Since some stops have more frequent services than others, it is useful to consider the service frequency. Here, we are doing to focus on the morning commute time (7AM to 10 AM) and count the number of departures at each stops.
service_ids <- atl$.$dates_services %>% pull(service_id)
stop_freq <- get_stop_frequency(atl, start_time = 3600*7, end_time = 10*3600, service_ids = service_ids, by_route = T)
stop_freq_sf <- atlsf$stops %>%
left_join(stop_freq, by="stop_id") %>%
filter(!is.na(n_departures))
freq_pal <- colorNumeric("Reds", stop_freq_sf$n_departures)
leaflet() %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
addCircles(data = stop_freq_sf,
fillColor = freq_pal(stop_freq_sf$n_departures),
fillOpacity = 0.8,
weight = 10,
stroke = FALSE,
popup = str_c(stop_freq_sf$stop_name, ": ", stop_freq_sf$n_departures)) %>%
addLegend(position = "bottomright",
pal = freq_pal,
values = stop_freq_sf$n_departures,
title = "Count of Departures <br> between 7AM and 10AM")
Merge frequency by stop information with ACS data
stop_tract <- acs2020c %>%
st_transform(crs = 4326) %>%
st_join(stop_freq_sf) %>%
group_by(GEOID) %>%
summarise(n_departures = sum(n_departures),
hhinc = mean(hhinc),
r_tot = mean(r_tot),
pct_wh = mean(pct_wh),
pct_bl = mean(pct_bl),
pct_novhc = mean(pct_novhc),
r_tot = mean(r_tot),
ln_pop_den = mean(ln_pop_den))
knitr::kable(head(stop_tract))
| GEOID | n_departures | hhinc | r_tot | pct_wh | pct_bl | pct_novhc | ln_pop_den | geometry |
|---|---|---|---|---|---|---|---|---|
| 13063040202 | 365 | 35530 | 2832 | 0.0497881 | 0.7916667 | 0.1889764 | 5.611199 | POLYGON ((-84.43865 33.6168… |
| 13063040203 | 54 | 39500 | 3911 | 0.0352851 | 0.8665303 | 0.0795869 | 7.203325 | POLYGON ((-84.45856 33.5944… |
| 13063040204 | 147 | 44158 | 5045 | 0.0200198 | 0.9197225 | 0.1326844 | 7.508534 | POLYGON ((-84.44988 33.6160… |
| 13063040302 | 237 | 38227 | 5896 | 0.2170963 | 0.5875170 | 0.1170854 | 6.935060 | POLYGON ((-84.3885 33.63184… |
| 13063040306 | 149 | 34667 | 4465 | 0.1265398 | 0.7122060 | 0.0360419 | 6.932523 | POLYGON ((-84.3983 33.61745… |
| 13063040307 | NA | 51039 | 4393 | 0.3271113 | 0.3403141 | 0.0714807 | 7.262021 | POLYGON ((-84.37473 33.6036… |
z <- function(vector){ round((vector - mean(vector))/sd(vector), 3) }
pass_per_stop_pal <- colorQuantile(palette = "Reds", domain = stop_tract$n_departures)
leaflet() %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
addPolygons(data = stop_tract,
fillColor = pass_per_stop_pal(stop_tract$n_departures),
fillOpacity = 0.8,
color = 'grey',
weight = 1,
popup = paste0("ID: ", stop_tract$GEOID, ", Value: ", stop_tract$n_departures)) %>%
addLegend(position = "bottomright",
values = stop_tract$n_departures,
pal = pass_per_stop_pal)
message(str_c("there are ", sum(is.na(stop_tract$n_departures)), " NAs in n_departures"))
## there are 84 NAs in n_departures
plotly::ggplotly(departure_count)
Calculate travel times from one station to all other stations
The tidytransit package offers a very convenient function that
calculates the shortest travel times from a stop to all other stops. Try
?tidytransit::travel_times to see what this function does
and what arguments it takes. Notice that only two out of nine arguments
are required arguments, and the rest are optional. The required
arguments are ‘filtered_stop_times’ and ‘stop_name.’ If you do not
provide the optional arguments, the function will use the pre-populated
values for the calculation.
But if you run the function right away, there will be an error. The error is caused by the fact that there are stations that share names but are far from each other. There may be more than one stations in different parts of the city that share their names, which sometimes actually happen in real world. If duplicate names in our dataset is caused by these legitimate cases, the duplicate names should not be considered as errors. However, if in reality there are no such stations, the duplicate names would be actual flaws in your data.
The travel_times() function calculates travel time based
on station names. This function is designed to throw an error if there
are stops with the same name regardless of whether the
duplicate names are legitimate reflection of the real world or flaws in
the data. One exceptional case when duplicated names don’t cause error
is when the stations with duplicate names are within 300 meters from
each other – in this case, the function considers the two stations may
actually be the same station and merges them.
For all other cases (duplicate names that are 200-meters apart), we need to modify the duplicate names so that the names are actually different. We do that through the following steps:
- For each unique station names in stops table,
calculate how many duplicates there are and, if there are duplicates,
calculate distances to the duplicate stops.
stop_group_distances()function can do this. - Get names of the stops that have duplicates far from each other. If stops with duplicate names are close (e.g., within 200 meters), they may actually be same stations.
- For all stops with duplicates > 200-meter distances, change their station names by appending (1), (2), … (n) where n is the number of duplicates. For example, if there are two “MIDTOWN STATION”s, they will each be “MIDTOWN STATION (1)” and “MIDTOWN STATION (2)”.
- One of the requirements for
travel_times()function is thatatlobject contains a transfer table, which is absent when freshly imported from the hard drive. This is NOT always the case – the transfer table is an optional table in GTFS definition. Some entities include transfer table by default, some don’t. MARTA is among those who don’t include transfer table. We can usegtfs_transfer_table()function from gtfsrouter package to create this table and include it to theatlobject. - Finally, one of the two required arguments, ‘filtered_stop_times’,
takes an output from a function called
tidytransit::filter_stop_times(). This function takes two required arguments: (1) ouratlobject and (2) a date for which we’d like calculate travel time. You can also choose to specify (3) minimum departure time and (4) maximum arrival time.
Finally, you are ready to calculate travel times!
# Step 1:
stop_dist <- stop_group_distances(atlsf$stops, by='stop_name') %>%
# Step 2:
filter(dist_max > 200)
# Step 3
atl$stops <- atl$stops %>%
group_by(stop_name) %>%
mutate(stop_name = case_when(stop_name %in% stop_dist$stop_name ~ paste0(stop_name, " (", seq(1,n()), ")"),
TRUE ~ stop_name))
# Step 4
atl$transfers <- gtfsrouter::gtfs_transfer_table(atl,
d_limit = 200,
min_transfer_time = 120)
# Step 5
am_stop_time <- filter_stop_times(gtfs_obj = atl,
extract_date = "2021-08-14",
min_departure_time = 3600*7, # input unit is in second. So 3500*7 is 7AM
max_arrival_time = 3600*10) # similarly, this is 10AM
# travel_times
trvt <- travel_times(filtered_stop_times = am_stop_time,
stop_name = "MIDTOWN STATION",
return_coords = TRUE)
# ..and visualize the output
trvt_pal <- colorQuantile(palette = "Reds", domain = trvt$travel_time)
leaflet() %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
addCircles(data = trvt %>% st_as_sf(coords = c("to_stop_lon", "to_stop_lat"), crs = 4326), # converting trvt into sf object on the fly.
fillColor = ~trvt_pal(travel_time), # Define color
stroke = F, # Turn off the outer border lines of circles
radius = 300, # Size of the circle
fillOpacity = 0.7, # Transparency
popup = paste0("Travel Time from <br> <strong> MIDTOWN: ", round(trvt$travel_time/60, 2), " minutes <strong>") %>% # Defines what's displayed on popup
lapply(htmltools::HTML))
On average, you can see that it takes 50.1 minutes from the Midtown Station to other stations. On the map, you can see that it takes longer time to get to stops that are further away from the Midtown Station.
# Average time to get to other stations
print(str_c("On average, it takes ",
round(mean(trvt$travel_time)/60,1),
" minutes to travel from the Midtown Station to all other stations."))
## [1] "On average, it takes 50.1 minutes to travel from the Midtown Station to all other stations."